💡 in some very simple settings, we can use conjugate priors that make updating Prior ➡️ Posterior easy (e.g. the beta-binomial example we did by hand in class)
⚠️ in most other cases, we need to use MCMC to approximate our posterior distribution, and end up with draws from the posterior
Overview
We’ve spent a lot of time on who to interpret statistical tests and parameter estimates from Bayesian models, and even some time on how to fit these models. But what do we do with the outputs of the models?
Manipulating Posterior Draws
Marginal Effects
Manipulating Posterior Draws
With modern Bayesian methods, we usually use some kind of sampling algorithm like MCMC to get posterior draws from our posterior distribution.
draw: one individual sample from the posterior distribution. Contains one value per parameter in \(\theta\)
chain: the markov chain used to generate samples from \(\theta\), each iteration in the chain produces one draw
Manipulating Posterior Draws
If we’ve reached convergence, the draws will be draws from \(p(\theta \mid x)\) (our posterior) and we can use them to approximate the posterior distribution itself.
Manipulating Posterior Draws
While software like rstan and brms will often provide default summaries (mean, 95% Credible Interval, etc) of the posterior draws, it can be useful to know how to manipulate them yourself to calculate custom quantities.
approximate the probability of \(\theta\) being in a custom range
diagnose chain-specific issues
use a loss function \(f(x)\) to approximate the risk/reward of your estimates being wrong
Manipulating Posterior Draws
Use the data here to analyze the relationship between various personal/demographic variables and schools standardized test scores. Fit a Bayesian Mixed Effect GLM appropriate for the question and use the output and ggplots to answer the questions below.
Manipulating Posterior Draws
💡Remember, each draw from the posterior is a draw from the multivariate posterior distribution, you’ll get one sample of each parameter in your model.
🧠 think of each draw as a universe in the multi-verse: it’s one reasonable way the world could look based on your posterior*
*also keep this in mind: this is why we do calculations on the draw level first then aggregate. e.g. what’s the posterior of BMI?
Manipulating Posterior Draws
library(brms)library(tidyverse)library(bayesplot)library(tidybayes)data <-read.csv("/Users/chelseaparlett/Desktop/CPSC540ParlettPellerti/Data/23cw2.csv")# fit modelmm <-brm(standardized_test_score ~ hair_color + sleep_hours + grade_level + caloric_intake_per_day + (1| school_district),data = data)# print out variable namesget_variables(mm)# get raw drawsdraws <- mm %>%spread_draws(b_sleep_hours)# custom calcmean(draws$b_sleep_hours >5)
Extracting Posterior Draws
the spread_draws() function from tidybayes extracts the raw draws from a Bayesian model and puts them in a pretty format.
draws <- mm %>%spread_draws(b_sleep_hours)glimpse(draws)
we can also grab multiple parameters of different types and manipulate them to get quantities we’re interested in.
# get Random Intercept Drawsdraws <- mm %>%spread_draws(`b_Intercept`, r_school_district[District,]) %>%mutate(District_mean = b_Intercept + r_school_district)head(draws)
these are often values that are more interpretable and reportable to a general audience. For example, instead of reporting a posterior for Intercept, and a posterior for District Offsets, we report District Mean posteriors
# get Random Intercept Drawsdraws <- mm %>%spread_draws(`b_Intercept`, r_school_district[District,]) %>%mutate(District_mean = b_Intercept + r_school_district)ggplot(draws, aes(y = District, x = District_mean)) +stat_halfeye()
Extracting Posterior Draws
This also allows us to custom add references to the plots, like a 95% HDI for the Overall Intercept.
# get Random Intercept Drawsdraws <- mm %>%spread_draws(`b_Intercept`, r_school_district[District,]) %>%mutate(District_mean = b_Intercept + r_school_district)mean_overall <-median_qi(draws %>%ungroup(),intercept_mean = b_Intercept,.width =c(.95, .50))ggplot(draws, aes(y = District, x = District_mean)) +stat_halfeye() +geom_vline(xintercept = mean_overall[[1, "intercept_mean"]], color ="red") +annotate(geom ="rect",xmin = mean_overall[[1, ".lower"]],xmax = mean_overall[[1, ".upper"]],ymin =0,ymax =6,fill ="red", alpha =0.2)
Summarizing Posterior Draws
Again, we can use these draws to get values and summaries that are easier for most people to understand.
# get Random Intercept Drawsdraws <- mm %>%spread_draws(`b_Intercept`, r_school_district[District,]) %>%mutate(District_mean = b_Intercept + r_school_district)draws |>median_qi(District_mean = b_Intercept + r_school_district, .width =c(0.89,0.5))
# A tibble: 10 × 7
District District_mean .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 District.A 52.4 47.9 57.0 0.89 median qi
2 District.B 48.4 43.9 52.9 0.89 median qi
3 District.C 55.0 50.5 59.5 0.89 median qi
4 District.D 60.3 55.7 65.0 0.89 median qi
5 District.E 45.0 40.4 49.5 0.89 median qi
6 District.A 52.4 50.5 54.4 0.5 median qi
7 District.B 48.4 46.5 50.3 0.5 median qi
8 District.C 55.0 53.0 56.8 0.5 median qi
9 District.D 60.3 58.3 62.3 0.5 median qi
10 District.E 45.0 43.0 46.8 0.5 median qi
Visualizing Posterior Draws
We want to know whether the posteriors of mean scores for different hair colors are different, and if any of them provide evidence that scores are “average (between 45-55).
# A tibble: 3 × 5
.chain .iteration .draw name value
<int> <int> <int> <chr> <dbl>
1 1 1 1 Black 50.7
2 1 1 1 Brown 51.3
3 1 1 1 Blonde 50.5
Visualizing Posterior Draws
We want to know whether the posteriors of mean scores for different hair colors are different, and if any of them provide evidence that scores are “average (between 45-55).
Family: bernoulli
Links: mu = logit
Formula: sex ~ bill_length_mm + bill_depth_mm
Data: penguins (Number of observations: 333)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -25.33 2.95 -31.16 -19.78 1.00 2111 1995
bill_length_mm 0.28 0.04 0.21 0.35 1.00 2323 2411
bill_depth_mm 0.78 0.10 0.60 0.98 1.00 2377 2333
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Linear Predictions are draws from the linear prediction \(\mathbf{X}\beta\) that are linear combinations of the columns of our design matrix before transforming them with the link function.
In logistic regression, this would be predictions on the log odds scale (“what are the predicted log odds of being registered to vote?”)
Linear Predictions
❓ we get out a \(4000 \times 333\) dimensional matrix as output. Where do those two numbers come from???
dim(penguins)
[1] 333 8
lin_preds <-posterior_linpred(log_model)# lin_preds_long <- linpred_draws(log_model, newdata = penguins) # get data in long formatdim(lin_preds)
Posterior Predictions are draws of data \(y_{rep}\) that are essentially new samples of data that are based on the posterior distribution from the model \(p(\tilde{y} \mid y)\).
In logistic regression, this would be new samples of binary (0/1) data that are based on the parameters learned by the model.
Posterior Predictions
p_preds <-posterior_predict(log_model)# p_preds_long <- predicted_draws(log_model, newdata = penguins) # get data in long formatdim(p_preds)
[1] 4000 333
Working With Posterior Distributions
Andrew Heiss made this great cheat sheet (though his notation is a little different than ours).
from Andrew Heiss’s Blog
Further Reading
Marginal Effects
Now that you know about how to work with posterior draws and predictions, you’re ready to know about Marginal Effects!
Vincent Arel-Bundock’s Marginal Effects Zoo
What are MEs?
Unfortunately, there’s not enough time to go into depth with Marginal Effects, but I want you to be aware of what they are, because they’ll put you head and shoulders above your peers when it comes to communicating your results with non-experts. Vincent Arel-Bundock’s book Model to Meaning, and Andrew Heiss’s blog Marginalia are great resources to jump in here. I’ll be pulling from both a lot here!
Shout out to Andrew Heiss’ blog again for allll the plotting code used in the examples in this subsection.
Why MEs?
Marginal Effects can help you explain/plot effects that are otherwise difficult to communicate.
Telling someone “The coefficient for \(x^2\) is \(5\)” is not something that will actually help them understand what the effect really means. Similarly, GLMs that use some confusing link functions can make it hard for non-experts (and experts) to really grok what effects indicate. Marginal Effects can help!
We covered about 0.001% of what Marginal Effects are, highly recommend the book Models to Meaning by Vincent Arel-Bundock. He also wrote the marginaleffects packages in R and python which work with both Frequentist and Bayesian models!